object-oriented architecture
Partially-Supervised Image Captioning
Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild --- for example, as assistants for people with impaired vision --- a much larger number and variety of visual concepts must be understood. To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets. Since image labels and object classes can be interpreted as partial captions, we formulate this problem as learning from partially-specified sequence data. We then propose a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which we represent using finite state automata. In the context of image captioning, our method lifts the restriction that previously required image captioning models to be trained on paired image-sentence corpora only, or otherwise required specialized model architectures to take advantage of alternative data modalities. Applying our approach to an existing neural captioning model, we achieve state of the art results on the novel object captioning task using the COCO dataset. We further show that we can train a captioning model to describe new visual concepts from the Open Images dataset while maintaining competitive COCO evaluation scores.
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation
Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera pose, all in 3D. The existing methods either are ineffective or only tackle the problem partially. In this paper, we propose an end-to-end model that simultaneously solves all three tasks in real-time given only a single RGB image. The essence of the proposed method is to improve the prediction by i) parametrizing the targets (e.g., 3D boxes) instead of directly estimating the targets, and ii) cooperative training across different modules in contrast to training these modules individually. Specifically, we parametrize the 3D object bounding boxes by the predictions from several modules, i.e., 3D camera pose and object attributes. The proposed method provides two major advantages: i) The parametrization helps maintain the consistency between the 2D image and the 3D world, thus largely reducing the prediction variances in 3D coordinates.
Object-Oriented Dynamics Predictor
Generalization has been one of the major challenges for learning dynamics models in model-based reinforcement learning. However, previous work on action-conditioned dynamics prediction focuses on learning the pixel-level motion and thus does not generalize well to novel environments with different object layouts. In this paper, we present a novel object-oriented framework, called object-oriented dynamics predictor (OODP), which decomposes the environment into objects and predicts the dynamics of objects conditioned on both actions and object-to-object relations. It is an end-to-end neural network and can be trained in an unsupervised manner. To enable the generalization ability of dynamics learning, we design a novel CNN-based relation mechanism that is class-specific (rather than object-specific) and exploits the locality principle. Empirical results show that OODP significantly outperforms previous methods in terms of generalization over novel environments with various object layouts. OODP is able to learn from very few environments and accurately predict dynamics in a large number of unseen environments. In addition, OODP learns semantically and visually interpretable dynamics models.
- North America > United States (0.14)
- Asia > Singapore (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.67)
VastTrack: Vast Category Visual Object Tracking
V astTrack consists of a few attractive properties: (1) V ast Object Category . In particular, it covers targets from 2,115 categories, significantly surpassing object classes of existing popular benchmarks ( e.g ., GOT -10k with 563 classes and LaSOT with 70 categories). Through providing such vast object classes, we expect to learn more general object tracking.
- North America > United States > Texas (0.14)
- South America > Brazil (0.04)
- Oceania > New Zealand (0.04)
- (10 more...)
- Transportation > Passenger (1.00)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- (12 more...)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.90)
- Information Technology > Artificial Intelligence > Robots (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe (0.04)
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
- (2 more...)
- Health & Medicine (0.67)
- Information Technology > Security & Privacy (0.46)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.64)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)
- Information Technology > Artificial Intelligence > Robots (0.67)
Efficient Meta Neural Heuristic for Multi-Objective Combinatorial Optimization (Appendix) A Model architecture The architecture of the base model in meta-learning is the same as POMO [ 26
Each sublayer adds a skip-connection (ADD) and batch normalization (BN). The decoder sequentially chooses a node according to a probability distribution produced by the node embeddings to construct a solution. The scaled symmetric sampling method is shown in Algorithm 2. The scaled factor The uniform division of the weight space is illustrated as follows. Thus, its approximate Pareto optimal solutions are commonly pursued. V ehicles must serve all the customers and finally return to the depot.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.40)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)